Scalable and accurate knowledge discovery in real world databases
نویسنده
چکیده
ion: Meta-data are given at different levels of abstraction, a conceptual (abstract) and a relational (executable) level. This makes an abstract case understandable and re-usable. Data documentation: All attributes together with the database tables and views, which are input to a preprocessing chain are explicitly listed at both, the conceptual and relational part of the meta-data level. An ontology allows to organize all data, e.g. by distinguishing between concepts of the domain and relationships between these concepts. For all entities involved, there is a text field for documentation. This makes the data much more understandable, e.g. by human domain experts, than if just referring to the names of specific database objects. Furthermore, statistics and important features for data mining (e.g., presence of null values) are accessible as well. This augments the meta-data usually found in relational databases and gives a good overview of the data sets at hand. Case documentation: The chain of preprocessing operators is documented, as well. First of all, the declarative definition of an executable case in the M4 model can already be considered to provide a documentation. Furthermore, apart from the opportunity to use “speaking names” for steps and data objects, there are text fields to document all steps of a case together with their parameter settings. This helps to quickly figure out the relevance of each step and makes cases reproducible. Ease of case adaptation: In order to run a given sequence of operators on a new database, only the relational meta-data and their mapping to the conceptual meta-data has to be defined. A sales prediction case can, for instance, be applied to different kinds of shops, and a standard sequence of steps for preparing time series for a specific learner might even serve as a template that applies to very different mining contexts. The same effect eases the maintenance of cases, when the database schema changes over time. The user just needs to update the corresponding links from the conceptual to the relational level. This is especially easy when all abstract M4 entities are documented. The MININGMART project has developed a model for meta-data together with its compiler, and has implemented human-computer interfaces that allow database managers and case designers
منابع مشابه
Data Mining & Knowledge Discovery in Databases: An AI Perspective
Data mining and Knowledge discovery has several important application areas. Data mining and knowledge discovery have been topics considered at many AI, database and statistical conferences. Knowledge discovery generally refers to the process of identifying valid, novel and understandable patterns. Knowledge discovery from large databases, often called data mining, refers to the application of ...
متن کاملDiscovery of Data Dependencies in Relational Databases Lss8 Report 14 Discovery of Data Dependencies in Relational Databases Lss8 Report 14
Knowledge discovery in databases is not only the nontrivial extraction of implicit, previously unknown and potentially useful information from databases. We argue that in contrast to machine learning, knowledge discovery in databases should be applied to real world databases. Since real world databases are known to be very large, they raise problems of the access. Therefore, real world database...
متن کاملFrom Data Mining to Knowledge Discovery in Databases
databases have been attracting a significant amount of research, industry, and media attention of late. What is all the excitement about? This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particula...
متن کاملThe Status of Research on Rough Sets for Knowledge Discovery in Databases
Knowledge Discovery in Databases (KDD) has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Many aspects of KDD have been investigated in several related fields. The emphasis of ongoing res...
متن کاملAn Intelligent Approach of Rough Set in Knowledge Discovery Databases
Knowledge Discovery in Databases (KDD) has evolved into an important and active area of research because of theoretical challenges and practical applications associated with the problem of discovering (or extracting) interesting and previously unknown knowledge from very large real-world databases. Rough Set Theory (RST) is a mathematical formalism for representing uncertainty that can be consi...
متن کاملDarwin: A Scalable Integrated System for Data Mining
Darwin is a high-performance scalable integrated system for Data Mining and Knowledge Discovery in large databases. In this paper we present an overview of Darwin’s philosophy, architecture and functionality. We also describe the application of Darwin to selected datasets.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007